Metacontrol for Adaptive Imagination-Based Optimization
نویسندگان
چکیده
Many machine learning systems are built to solve the hardest examples of a particular task, which often makes them large and expensive to run—especially with respect to the easier examples, which might require much less computation. For an agent with a limited computational budget, this “one-size-fits-all” approach may result in the agent wasting valuable computation on easy examples, while not spending enough on hard examples. Rather than learning a single, fixed policy for solving all instances of a task, we introduce a metacontroller which learns to optimize a sequence of “imagined” internal simulations over predictive models of the world in order to construct a more informed, and more economical, solution. The metacontroller component is a model-free reinforcement learning agent, which decides both how many iterations of the optimization procedure to run, as well as which model to consult on each iteration. The models (which we call “experts”) can be state transition models, action-value functions, or any other mechanism that provides information useful for solving the task, and can be learned on-policy or off-policy in parallel with the metacontroller. When the metacontroller, controller, and experts were trained with “interaction networks” (Battaglia et al., 2016) as expert models, our approach was able to solve a challenging decision-making problem under complex non-linear dynamics. The metacontroller learned to adapt the amount of computation it performed to the difficulty of the task, and learned how to choose which experts to consult by factoring in both their reliability and individual computational resource costs. This allowed the metacontroller to achieve a lower overall cost (task loss plus computational cost) than more traditional fixed policy approaches. These results demonstrate that our approach is a powerful framework for using rich forward models for efficient model-based reinforcement learning.
منابع مشابه
RELIABILITY-BASED DESIGN OPTIMIZATION OF COMPLEX FUNCTIONS USING SELF-ADAPTIVE PARTICLE SWARM OPTIMIZATION METHOD
A Reliability-Based Design Optimization (RBDO) framework is presented that accounts for stochastic variations in structural parameters and operating conditions. The reliability index calculation is itself an iterative process, potentially employing an optimization technique to find the shortest distance from the origin to the limit-state boundary in a standard normal space. Monte Carlo simulati...
متن کاملAirfoil Shape Optimization with Adaptive Mutation Genetic Algorithm
An efficient method for scattering Genetic Algorithm (GA) individuals in the design space is proposed to accelerate airfoil shape optimization. The method used here is based on the variation of the mutation rate for each gene of the chromosomes by taking feedback from the current population. An adaptive method for airfoil shape parameterization is also applied and its impact on the optimum desi...
متن کاملOptimal Placement and Sizing of DGs and Shunt Capacitor Banks Simultaneously in Distribution Networks using Particle Swarm Optimization Algorithm Based on Adaptive Learning Strategy
Abstract: Optimization of DG and capacitors is a nonlinear objective optimization problem with equal and unequal constraints, and the efficiency of meta-heuristic methods for solving optimization problems has been proven to any degree of complex it. As the population grows and then electricity consumption increases, the need for generation increases, which further reduces voltage, increases los...
متن کاملAdaptive Rule-Base Influence Function Mechanism for Cultural Algorithm
This study proposes a modified version of cultural algorithms (CAs) which benefits from rule-based system for influence function. This rule-based system selects and applies the suitable knowledge source according to the distribution of the solutions. This is important to use appropriate influence function to apply to a specific individual, regarding to its role in the search process. This rule ...
متن کاملA limited memory adaptive trust-region approach for large-scale unconstrained optimization
This study concerns with a trust-region-based method for solving unconstrained optimization problems. The approach takes the advantages of the compact limited memory BFGS updating formula together with an appropriate adaptive radius strategy. In our approach, the adaptive technique leads us to decrease the number of subproblems solving, while utilizing the structure of limited memory quasi-Newt...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1705.02670 شماره
صفحات -
تاریخ انتشار 2017